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First we calculate entropy by calculating log likelihood of seen data. Assume we have a source 
of data and each symbol has probability of p(xi) = pi. Then randomly chosen geometric mean 
likelihood of data (N symbol long string) is: 

p =dl PjPk-Pi) 1/N 

Now assume each symbol, on average, exists on a string PjN times when N is large. 

p=( n p\ iN p p 2 2 N -p p m n ) 1/n 

p=YIp p ip p i-p p m 

And by taking logarithm we have 

log (P) = E * Pi log(p*) 

This measure has non-positive values so we instead hx it to be positive by using minus sign which 
also makes it same as entropy where we measure average bits required by each symbol when the 
signal is encoded. 


H(x) = -log(P) = -E i Pilog(Pi) 

Assume now instead that we want to compare two distributions. In this case, we may want to 
calculate mismatch between two different distributions by calculating ratios like: 

R=U t ma x(pi/q i ,q i /p l ) {Pt+qi>/ ' 


Here we calculate maximum ratio between each probability and its average mean value when 
distributions are blended together (pi+ qi )/2. 


log (R) = E i 


Pi + 
2 



This entropy-like measure then is always better way to compare distributions unlike KL-divergence 
measure where the ratios between each parameter may cancel others (like encoding bit lenghts may 
compensate for shorter ones elsewhere) because we do not take maximum value of the ratios: 


R ' = lh (Pi/<n) 


(Pi+«?i)/2 


log(ff) = E, ^losfe) 


My thanks go to Tommi Jauhiainen who introduced me into this idea although I did the necessarily 
math to understand reason why absolute valued logarithms in KL-divergence are always better 
when comparing distributions. 
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